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[57] ABSTRACT 

In a device for converting visual images into representa- 
tive sound information especially for visually handi- 
capped persons an image processing unit is provided 
with a pipelined architecture with a high level of paral- 
lelisum, An image is scanned in sequential vertical scan- 
lines and the acoustical representatives of the scanlines 
are produced in real time. Each scanline acoustical 
representation is formed by sinusoidal contributions 
from each pixel in the scanline, the frequency of the 
contribution being determined by the position of the 
pixel in the scanline and the amplitude of the contribu- 
tion being determined by the brightness of the pixel. 

9 Claims, 7 Drawing Sheets 



COLUMN 




09/15/2003, EAST Version: 1.04.0000 



U.S. Patent 



Mar. 17, 1992 



Sheet 1 of 7 



5,097 




09/15/2003, EAST Version: 1.04.0000 



U.S. Patent Mar. 17, 1992 Sheet 2 of 7 5,097,326 




09/15/2003, EAST Version: 1.04.0000 



U.S. Patent Mar. 17, 1992 Sheet 3 of 7 




09/15/2003, EAST Version: 1.04.0000 



U.S. Patent Mar. 17, 1992 Sheet 4 of 7 5,097,326 




09/15/2003, EAST version: 1.04.0000 



U.S. Patent Mar. 17, 1992 Sheet 5 of 7 5,097,326 




09/15/2003, EAST Version: 



1.04.0000 



U.S. Patent Mar. 17, 1992 Sheet 6 of 7 5,097 




09/15/2003, EAST Version: 1.04.0000 



U.S. Patent Mar. 17, 1W2 Sheet 7 of 7 



5,097,326 



1/2 AD644JH 1/2 AD644JH 

nf>i rt> 



s 



32 31 33 28 27 29 

V 00 40 

Ktr AD7546 

(DAC) 



25 



r- 3B 
\- 35 

rn 3i v «"t 



CS WR 

V REF 16 19 



r 24 
V 23 



50 



♦ BV 



'^TU>84 

0 



> 



'/4TL084 

G 



T 




09/15/2003, EAST Version: 1.04.0000 



5,097, 



IMAGE-AUDIO TRANSFORMATION SYSTEM 

BACKGROUND OF THE INVENTION 

1. Field of the Invention 5 
The invention relates to a transformation system for 

convening visual images into acoustical representa- 
tions. 

2. Description of the Related Art 

An article of L. Kay in IEE Proceedings, Vol 131, 10 
No. 7, September 1984, pp 359-576, reviews several 
mobility aids for totally blind or severely visually handi- 
capped persons. With some of these aids visual informa- 
tion is convened into acoustical representations, e.g. the 
laser cane, but the conveyed amount of visual informa- 15 
tion is very low. In fact, these systems are mainly elec- 
tronic analogs or extensions of the ordinary (long) cane, 
as they arc obstacle detectors for a single direction 
pointed at. Direct stimulation of the visual cortex has 
also been tried, but up to now only with poor success. 20 
The disadvantage of having to apply brain surgery is an 
obvious obstacle in the development of this approach. 
Another possibility is to display an image by a matrix of 
tactile stimulators, using vibrotactile or electrocutane- 
ous stimulation. The poor resolution of present modest- 25 
sized matrices may be a major reason for a lack of suc- 
cess in mobility. 

Another approach mentioned in the Kay article is to 
concert acoustical representations. With this approach, 
called sonar, the problem of ambiguity arises, because 30 
very different configurations of obstacles may conceiv- 
ably yield almost the same acoustic patterns. Another 
problem is that the complexity of an acoustic refraction 
pattern is very hard to interpret and requires extensive 
training. Here too the spatial resolution is rather low 35 
due to a far from optimal exploitation of available band- 
width. The range of sight is rather restricted in general. 
The document WO 02/00395 (which corresponds to 
U.S. Pat. No. 4,378,569) discloses a system in which an 
image sensed by a video signal generator is converted 40 
into sound information through a number of circuit 
channels corresponding with the number of pixels in a 
line of the image. This results in bulky circuitry mis- 
matches in which may cause inaccuracies in the sound 
information generated. 45 

SUMMARY OF THE INVENTION 

It is an object of the invention to provide a transfor- 
mation system in which the restrictions of known de- 
vices are avoided to a large extent, inaccuracy is 50 
avoided and the amount of circuit elements is substan- 
tially reduced enabling faithful real-time transforma- 
tion. To this end a transformation system, recited in the 
preamble, is in accordance with the invention charac- 
terized in that, the transformation system implements 55 
the transformation in a manner which reduces the num- 
ber of hardware processing channels down from a num- 
ber corresponding with the number of image pixels to 
be transformed to derive an acoustic signal section. 

The invention is based on the following basic consid- 60 
era t ions. 

True visual input seems the most attractive, because it 
is known to provide an adequate description of the 
environment without distance limitations. Therefore a 
camera should be used as a source of data to be con- 65 
verted. 

An acoustic representation will be used, because the 
human hearing system probably has, after the human 



2 

visual system, the greatest available bandwidth. The 
greater the bandwidth, the larger the amount of infor- 
mation that can be transmitted in any given time inter- 
val. Furthermore, the human hearing system is known 
to be capable of processing and interpreting very com- 
plicated information, e.g. speech in a noisy environ- 
ment, considering the flexibility of the human brain. 

The mapping of an image into sound should be ex- 
tremely simple from the viewpoint of a blind person. It 
must be understood at least for simple pictures in the 
beginning by any normal human being without much 
training and efTort. This reduces psychological barriers. 

A scanhne approach will be used to distribute an 
image in time. Thus a much higher resolution for a 
given bandwidth is obtained. The time needed to trans- 
fer a single image must remain modest to make sure the 
entire image is grasped by short-term human memory 
for further interpretation. The transfer time must also be 
as smalt as possible to refresh an image as often as possi- 
ble in order to have an up-to-date representation of 
reality, such as for moving objects. An upper limit to 
the conversion time will therefore be on the order of a 
few seconds. 

The image-to-sound conversion must occur in real- 
time to be useful. 

The system should be portable, low-power and low- 
cost to be suitable for private use in battery-operated 
mobility applications. 

In a preferred embodiment storage means are incor- 
porated to store digitized images so that image blurring 
during the conversion is eliminated. 

A preferred embodiment of the invention is provided 
with a low- weight (portable), low -power (battery- 
feeded) image sensing system preferably incorporating 
digital data processing such as a CCD camera. 

In a further preferred embodiment an image process- 
ing unit is based on orthogonally scanning a grid of 
pixels each pixel having one of a row of discrete bright- 
ness values. Such a grid may be built up by, for example, 
64 X 64 pixels with, for example, sixteen possible bright- 
ness levels for every pixel. 

A further embodiment comprises pipelined digital 
data processing for converting digitized image informa- 
tion into acoustical representation. Preferably a high 
degree of parallel processing is used in order to improve 
converting speed using the available structural process- 
ing lines efficiently. 

A transformation system according to the invention is 
preferably suited to read out an image scanline-wise, for 
example in 64 scanlines with 64 pixels on each tcanline. 
Preferably a scanline position is represented in time 
sequence and a pixel position is represented in fre- 
quency or vice versa while the brightness of a pixel is 
represented in amplitude of an acoustical representa- 
tion. 

In a further embodiment binaural stimulation with 
left and right acoustic representations is used to support 
the feeling for direction of movement. Two image sen- 
sors positioned apart a distance corresponding with the 
eye separation or simulating this geometry can be used 
to incorporate viewing for a better relative distance 
perspective. 

Furthermore, the frequency distribution in and/or 
the duration of an acoustic representation can be made 
user programmable and/or selectable. 
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fin iff nF^miPTiOK of thf or awinh Wghcr lhe P osilion of a P ixel » the co ,un ™> the higher 
BRIEF DESCRIPTION OF THE DRAWING the frcqucnC y ( ^ the brig hter the pixel, the larger the 

Some embodiments according to the invention will amplitude. Signals 12 of all oscillator signals in the col- 
be described in more detail with reference to the draw- umn of a particular scanline are summed and converted 
tng. In the drawing: 3 with the aid of a converting-summing unit 14 into 

FIG. 1 is a pictorial illustrating basic principles of the acoustical signals 16. After this total signal has sounded 

transformations performed by the invention, for some time, the scanline moves to the next column, 

FIG. 2 is a block diagram of a system to perform and the same conversion takes place. After all columns 

transformations analogous to the transformations of of the images have thus been converted into sound, a 

FIG. 1. 10 new and up-to-date image is stored, and the conversion 

FIG. 3 is a block diagram illustrating the design and starts anew. At this point in time the scanline jumps 

operation of a wavefonn synthesis stage within an from the last, here rightmost column, to the first, here 

image processing unit of FIG. 2, leftmost column of the image. 

FIG. 4 is a flow chart showing the algorithmic struc- Due to the simplicity of this transformation, no train- 

turc of the image processing unit, of FIG. 2, 15 ing will be needed to interpret simple pictures. For 

FIG. 5 an illustration showing more detail of the example, one may consider a straight white line on a 

control logic of the image processing unit of FIG. 4, black background, running from the bottom left corner 

FIG. 6 is a block diagram of a Gray-code analog-to- to the top right corner of the image. This will obviously £ 2-2^ 

digital converter for the image processing unit of FIG. result in a tone that starts having a low pitch and that 

4, 20 increases steadily in pitch, until the high pitch of the top 

FIGS. 7 and 8 are block diagrams illustrating an ana- right pixel is reached. This sound is repeated over and 

log output state for the image processing unit of FIG. 4. over as a new frame (here with the same contents) is 

FIG. 9a is a detailed schematic of a differentiation grabbed every few seconds. A white vertical line on a 

element in FIG. 5a. black background will sound as bandwidth-limited 

FIG. 9b is a timing diagram relating input and output 25 noise with a duration corresponding to the width of the 

to the schematic of FIG. 9a. line etc. After understanding the transformation of sim- 

sou no generated from transformation oi more compli- 
In order to put the description of the invention in 30 cated pictures will require more training, as learning a 
perspective, it is useful to illustrate with a very simpli- new language does. But initially there is the bonus that 
fied example, as given in FIG. 1, the principles of the understanding what sounds are generated from simple 
way the invention transforms visual images into acousti- pictures may enhance the motivation of a user of the 
/ cal representations. The particular orientations and transformation system to proceed in practicing. 
^ \ directions indicated in FIG. 1 and described below are 33 An example of a system that performs image-to- 
not essential to the invention. For example, the transfor- sound conversion is depicted in FIG. 2. Images 20 are 
mat ion may be reconfigured for scanning from top to transformed into electronic signals by an image sensing 
bottom instead of from left to right, without violating unit 22 provided with a unit 24 for conversion of images 
the principles of the invention. Other examples are the into electronic signals, such as a camera. These elec- 
reversal of directions, such as scanning from right to 40 tronic signals, which include synchronization signals, 
left, or having high frequencies at low pixel positions are processed by an image processing unit 26, in which 
etc. However, for the purpose of illustration particular a number of transformations take place. The image 
choices for orientations and directions were made in the processing unit takes care of the conversion of analog 
description, in accordance with FIG. 1, unless stated electronic image signals into digital signals, which are 
otherwise. Similarly, the particular number of rows, 43 stored in digital memory 28, after which digital data 
columns or brightness values in the image is not at all processing and waveform synthesis in a data processing 
fundamental to the transformation. The example of and wavefonn synthesis unit 30 yields a digitized wave- 
FIG. 1 indicates for the sake of simplicity eight rows form, which is finally converted into analog electronic 
and eight columns with three brightness values, or grey- signals in a D/A conversion and analog output stage 32 
tones, per pixel. A more realistic example of a transfor- 30 for a sound generating output unit 34. The sound gener- 
mation system for visual images further described has ating output unit could include headphones or any other 
64 rows and 64 column, with sixteen brightness values system for converting electronic signals into sound, 
per pixel. which docs not exclude the possibility of using interme- 

FIG. 1 shows a chess-board -like visual image 9 parti* diate storage, such as a tape recorder. The image pro- 
tioned into eight columns 1 through 8 and eight rows 1 33 ccssing unit and, if they require a power supply, also the 
through 8, giving 64 pixels 10. For simplicity the bright- image sensing unit and/or the sound generation unit are 
ness of the image pixels can have one of three grey- powered by a power supply 38 as indicated in FIG. 2. 
tones: white, grey or black. This image can be consid- Dashed lines indicate this condition, since for example 
jo ered to be scanned in successive vertical acanlines coin- headphones normally do not require an additional 
^ ciding with any of the columns 1 through 8. An image 60 power supply. The power supply may be configured for 
. processing unit 11 containing a digital representation of battery operation. 

F such an image, converts the vertical scanlines one after In the following, architectural considerations for an 
^Ml another into sound, in accordance with a particular image processing unit and in particular the digital data 
* scanning sequence, here from left to right. For any / processing and wavefonn synthesis unit within the 
given vertical scanline, the position of a pixel in the 63 image processing unit, as indicated in FIG. 2, are de- 
column uniquely determines the frequency of an oscilla- scribed. Transformation of image into sound can be seen 
tor signal, while the brightness of this pixel uniquely as a flow of data, undergoing a relatively complicated 
determines the amplitude of this oscillator signal. The transformation, this complicated transformation can be 
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decomposed into different processing stages, while re- all the possible multiplications may be retrieved from a 
during the complexity of the transformations taking memory module 42. The scaled sine value resulting 
place in individual stages. This will in general also allow from this (implicit) multiplication is added to results 
for a reduction in processing time per stage. Data leav- from previous pixels- When the results for all pixels i, 64 
ing a stage can be replaced by data leaving the previous S in the detailed design, have thus been accumulated, the 
stage. The processing of a number of data takes place in overall result is a single sample of the emulated superpo- 
parallel, although in difTerenl stages of the total trans- sition of 64 oscillators. 

formation. Thus a much larger data flow can be In the waveform synthesis unit indicated schemati- 
achieved by organizing the architecture as a pipeline, cally in FIG. 3, several pixels are processed simulta- 
i.e. a sequence, of simple stages. In the design for the 10 neously, although in different stages of the pipelines 
invention such a pipelined architecture has been ap- processing. A more detailed description of this parallel - 
plied. Without this kind of parallelism, a fast mainframe ism is given in the following. In the system the follow- 
would have been needed to get the same real-time re- ing operations take place in parallel, once the image-to- 
sponsc. Obviously, the system would then not be porta- sound conversion has started: 

ble, nor low-cost, nor low-power. The image process- 15 At clock phase L(ow), duration 250 ns, an address is 
ing unit can be seen as a special purpose computer with calculated for phase-change memory containing the 
mainframe performance capabilities with respect to its phase changes per time step of each of the 64 oscillators, 
restricted task, enabling real-time image-to-sound con- This time step is the step corresponding to the final 
version using a clock frequency of only 2 MHz. This m sample frequency of 1/64 times the system clock fre- 
turn enables the use of standard normal-speed compo- 20 quency (here 2 MHz), i.e. 32 us. The above address 
nents like EPROMs (Erasable Programmable Read- corresponds to the phase change of one particular oscil- 
Only Memories) with 250 ns access times and static lator. The new phase of another oscillator (calculated in 
RAMs (static Random-Access Memories) with 1 50 ns earlier cycles) is stored, and at the same time used as an 
access times, thereby reducing the cost of the system. address for sine memory containing the sine as a func- 
Only the two phases of a single system clock are used 25 tion of phase, scaled by an amplitude measure. The 4-bit 
for synchronization. The system emulates the behaviour amplitude measure is simultaneously provided by video 
of a superposition of 64 amplitude-controlled indepen- memory containing the frame as it was grabbed by a 
dent oscillators in the acoustic frequency range. There 4-bit analog-to-digital converter (ADC). The sum of 
are reasons for selecting this number, although other scaled sine samples that came from previously handled 
numbers can be used without violating the principles of 30 oscillators is stored in a register. This is needed to obtain 
the invention. Yet, for simplicity of the description the one final sample of the superposition of 64 amplitude- 
Dumber 64 and numbers derived from it are used on controlled digital oscillators. 

many occasions without further discussion. The 64 os- At clock phase H(igh), duration 250 ns, an address is 
cillators do not physically exist as 64 precisely tuned obtained for phase memory containing the present pha- 
oscillator circuits, because this would probably give an 35 ses of all oscillators. The phase of an oscillator is read 
unacceptably high system cost and size. Instead, the from this memory and added to its phase change, ob- 
system employs the available digital speed to calculate taincd from phase-change memory for which the ad- 
in real-time what the superposition of 64 independent dress was calculated at clock phase L. A new address 
oscillators, having amplitudes controlled by the trans- for the video memory is calculated. This address corre- 
formation would look like. To do this, it calculates a 40 sponds to the next pixel to be processed. A scaled sine 
16-bit sample of this superposition every 32 microsec- value is read from sine memory. The scaled sine value 
onds, thereby giving a 16-bit amplitude sample fre- of another oscillator is added to the sum of scaled sine 
quency of 31.25 kHz of the resulting acoustic signal values of previously handled oscillators. This is pan of 
(which, foT comparison, is close to the 44.1 kHz 16 bit the process of calculating the superposition of 64 oscil- 
amplitude sample frequency of CD-players) The contri- 45 lators. 

butions to the superposition from all 64 oscillators must After 64 of such system clock cycles, i.e. 32 us, a 
be determined in the 32 microseconds this allows for sample of the superposition of the 64 amplitude-con- 
500 nanoseconds per oscillator contribution. A suffi- trolled oscillators is ready. This value can be sent to a 
ciently parallel system will therefore be able to do this 16-bit digital-to-analog converter DAC 50 and an ana- 
job with a system clock frequency of 2 MHz. 50 log output stage 52, and from there to a sound generat- 

FIG. 3 illustrates an algorithmic structure of the syn- ing output unit such as a headphone 54 (FIG. 7). 
thesis, by indicating schematically how a single wave- In the above description, the clock phases H and L 
form sample of the superposition waveform is calcu- could have been interchanged, if this is done consis- 
lated within the image processing unit 40. This single tently throughout the text. The particular choice made 
waveform sample corresponds directly with a single 55 is not fundamental, but a choice must be made, 
sound amplitude sample. The algorithmic structure of As described above, both the sine amplitude scaling 
FIG. 3 can also be viewed as a block diagram of the and the frequencies of the oscillators are determined by 
waveform synthesis stage within the image processing the contents of memory. The main reason for using 
unit of FIG. 2. For a particular scanline, as in the exam- memory instead of dedicated hardware is that this pro- 
pie of FIG. 1, all pixels i on that scanline are processed. 60 vides the ability to experiment with arbitrary mappings. 
For a pixel i, a new phase <J>i + A<f>i is calculated, by An amplitude scaling other than multiplication by puel 
adding a phase increment from a phase change memory brightness may give better contrast as perceived by the 
to a previous phase retrieved from acumulated phase user. Non-equidistant oscillator frequencies will cer- 
memory memory. The result becomes the argument of lainly give better frequency resolution as perceived by 
a sine function in a scaled sine module 42, the resulting 65 the user, because the human hearing system is more 
sine value being multiplied by a pixel brightness value sensitive to differences in pitch at the lower end of the 
Ai. In the image processing unit, this multiplication spectrum: In a preferred embodiment a "wohl-tem- 
need not physically take place, but the results of one of perierte" set of frequencies is used, meaning that the 
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next higher frequency is a constant factor times the pixel vertical position and brightness. The ear receives 

previous frequency (i.e. they increase geometrically the entire vector as a superposition of oscillator signals, 

with pixel position). The resulting non-equidistant fre- The human hearing system must be able to decompose 

quencies appear subjectively as almost equidistant the vector into its elements. The hearing system is 

tones. Another important reason for avoiding dedicated 5 known to be capable of performing Fourier decomposi- 

bardware is the cost. Memory is cheap, provided the tion to some extent. Therefore the oscillator signals will 

transformation to be performed is simple enough to fit be made to correspond to Fourier components, i.e. 

in a few memory chips. Additionally, a stereoscopic scaled and shifted sines. The hearing system will then be 

pair of images can be processed simultaneously in a able to reconstruct the frequencies of individual har- 

similar manner to generate separate left and right acous- 10 monic oscillators, and their approximate amplitude (i.e. 

tic representations suitable for application to head- reconstruct pixel heights and brightness), 

phones. Another related criterion for choosing a particular 

In a preferred embodiment a video camera is used as waveform is that the image-to-sound conversion should 

the input source. Whenever the previous image-to- preferably be bijective to preserve information. A bijec- 

sound conversion has finished, a video frame is grabbed IS tive mapping has an inverse, which means that no in for- 

and stored in electronic memory in the single frame mation is lost under such a mapping. To preserve a 

time of the camera, e.g. one 50th of a second in most reversible one-to-one relation between pixels and oscil- 

European video cameras, and one 60th of second in Iator signals, the oscillator signals corresponding to 

most American video cameras. This is to avoid blurred different pixels must obviously be distinguishable and 

images due to moving objects or a moving camera. In 20 separable, because the pixels are distinguishable and 

the detailed design descriptions the use of the one 50th separable. The waveforms (functions) involved should 

second frame time of the PAL television standard is therefore be orthogonal. A superposition of such func- 

assumed for the choice of time constants used to grab tions will then give a uniquely determined vector in a 

the video signal, but this is not fundamental to the in* countably infinite dimensional Hilberl space. A com- 

vention. The image is stored as 64x64 pixels with one 25 plete (but not normalized) orthogonal set of basis vec- 

of 16 possible brightness values, i.e. grey-tones, for each tors in this Hilbert space is given by the functions 1, cos 

pixel. Other numbers of pixels or brightness values nt, sin nt, with n positive natural numbers. Of course 

could have been used, without violating the principles only use a small finite subset of these functions will be 

of the invention, but there are practical reasons for this used due to bandwidth limitations, 

panic ular choice. Then the next image-to-sound con- 30 Other reasons for using harmonic oscillators are the 

version starts. The electronic image can be considered. following. Harmonic oscillation is the mechanical re- 

as consisting of 64 vertical lines, each having 64 pixels. sponse of many physical systems after modest excita- 

First the leftmost verical line is read from memory. tion. This stems from the fact that the driving force 

Every pixel in this line is used to excite an associated towards the equilibrium position of most solid objects is 

digital harmonic oscillator. A pix el positioned higher in 35 in first order approximation linearly dependent on the 

the vert ical line co xres ponds"ta a n oscillator ofh ieher distance from the equilibrium position (Hooke's law), 

frequenc y (all hi the acoustic range) . The greater the The resulting second order differential equation for 

brightness of a pixel, the lar g er the am plitude of its position as a function of time has the sine as its solution 

oscillator si gnal. Nex t all 64 oscillator signals are (in the text amplitude and phase will implicitly be disre- 

summed (superposed) to obtain a total signal with 64 40 garded when not relevant, and just the term "sine" will 

Fourier components. This signal is sent to a DA-con- be used for short), provided the damping can be ne- 

verter and output through headphones. After about 16 glected. The sine is therefore a basic function for natural 

milliseconds the second (leftmost but one) vertical line sound, and it may expected that the human ear is well 

is read from memory and treated the same way. This adapted to it. Furthermore, the construction of the 

process continues until all 64 vertical lines have been 45 human car also suggests that this is the case, since the 

converted into sound, which takes approximately one basal membrane has a difference in elasticity of a factor 

second. Then a new video frame is grabbed and the 100 when going from basis to apex. The transversal and 

whole process repeats itself. longitudinal tension in the membrane is very small and 

As stated before, the image is assumed to be scanned will therefore riot contribute to a further spectral de- 

from the left to the right, without excluding other possi- 50 composition. However, it should be noted that the brain 

bilities for scanning directions. The horizontal position may also contribute considerably to this analysis: with 

of a pixel is then represented by the moment at which it electro-physiological methods it has been found thai the 

excites its emulated oscillator. The vertical position of a discharges in the hearing nerves follow the frequency of 

pixel is represented by the pitch of its associated oscilla- a pure sine sound up to 4 or 5 kHz. Because the firing 

tor signal. The intensity, or brightness, of a pixel is 55 rate of individual nerve cells is only about 300 Hz, there 

represented by the intensity, or amplitude, of the sound has to be a parallel system of nerves for processing these 

corresponding to its associated oscillator signal. higher frequencies. 

The human bearing system is quite capable decom- Periodic signals will give rise to a discrete spectrum 
posing a complicated signal in its Fourier- components in the frequency domain. In practice, periodic signals of 
(music, speech), which is precisely what is needed to 60 infinite duration cannot be used, because the signal 
interpret the vertical positions of pixels. The following should reflect differences between successive scanlines 
sections will discuss the limits of Fourier decomposition and changes in the environment, thus breaking the pen- 
in the bandwidth limited human hearing system, which odicity. The Fourier transform of a single sine-piece of 
will show that more than 64 independent oscillators finite duration docs not give a single spectral compo- 
would not be useful. 65 nent, but a continuous spectrum with its maximum cen- 

A picture is represented by the system as a time-vary- tred on its basic frequency. The requirement is that the 

ing vector of oscillator signals. The vector elements are spectrum of one sine-piece corresponding to a particu- 

the individual oscillator signals, each representing a lar pixel is clearly distinguishable from the spectrum of 
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a sinc-picce corresponding to a neighbouring pixel in tribution would then give AfD=B/N (N pixels in a 
the vertical direction. vertical line of the image). Combination of the above 

(In the following, I[. , .] denotes "the integral from . formula's yields (B.T)/(N 2)> *=xl. Taking 
to while to denotes the greek letter omega, ojO denotes xl = 1 .430, an estimated bandwidth of 5 kHz and a con- 
omega with subscript 0, PI denotes the greek letter PI 5 version time of 1 second yields 
etc. In general, a close typographical analog of the N<=SQRT(BT/x1) =59. Therefore the digital design 
conventional mathematical symbols is used. ^ preferably made with N=64 (a power of 2 for easy 

The Fourier transform is given by: design) to build a machine capable of achieving this 

theoretical maximum resolution estimate for a I second 
**J\Aft txf* -jwt) 4t ] 0 ^nve^on time . A further embodiment preferably uses 

_ - , , . . e a non-equidistant set of frequencies to obtain a better 

• n OUner J rtn , vf" .! "T" 1* comprise with the fluency sensitivity of the 

7 ™" * ^^^£3 »X h"« *V« e « Wh " h " *« -dually leads to a 

plication of a sine of infinite duration with a pulse of , . . °. ' a . . , - ' 

finite duration. This situation corresponds to i bright , 5 °* of "*? ,u,I0n *' * c . ^ «f 5* thc . s P ec t t ™ m 

pixel at a particular height in a (vertical) scanline. sur- 15 <™ htTt ™ 15 smaJ K lest) teslcd ff^memally^ 

rounded bydark pixels at the same height in neighbour- ™* * * c h ^ brwn ^ wc » * °[ 

• scan ij r j CS , "fillmg m details that may overcome a lack of local 

The Fourier transform of a pulse with amplitude 1, resolution, but that are suggested by the more global 
starting at a and ending at b is calculated as: « patterns. The braan can do a lot of image processing to 

extract useful information from noisy and blurred im- 
/(a>)=l[a,b\<xp(-f*i)di-(l/e>) un {»{b-ayi) ages. For example, vertices of a polygon can be sug- 

<xfi-j(rt0+b)/2) gested by incomplete edges. So care must be taken in 

the application of resolution estimates based only on 
Next the modulation theorem is applied: jo^i information and without a priori knowledge. 

These estimates may be too pessimistic. 

IMAGE PROCESSING UNIT 

A rather detailed description of the design of an 

image processing unit and the resulting architecture is 

mumdnOo^-^U [F<o> + «0) - F(w - w0))] « S(w) given in FIG. 4. The stream of data through the system 

will be followed to obtain a better insight in its opera- 

Subsitution of the above expression for F(o) and tion. It is a pipelined system, such that major parts of the 

some rewriting finally gives, with b-a=k*2PI/o>0 for system are kept busy at all times. When a particular 

|S(o>)| 2. 3J stage has completed an operation and shifted the result 

A formula in which for 6>>>1» k>>l and to the next stage* the next data item is shifted in from the 

|wO-a>|<<|ci>0+a>|,the highest peaks occur in the previous stage for processing, and so on. 

term: In FIG. 4 system buses are drawn having a width 

proportional to the number of bitlines. Because in the 

g**)=(*p//«0) 2MK M -4)r7M 2 ^ design there is not clearcut separation between the con- 

. - . ~ , v .t . j cepts of data buses and address buses no distinction is 

This simple formula for the tenn G(o>) will be used w an ^ fa thh ^ daU mXt6 b one n of thc 

approximation of S(o). It should be kept in mind that m afc ^ ^ addresstt b ^ ^ Approx j. 

this approximation is on ^ atc coordinates in FIG. 4 are indicated by two 
S( W ),buttlutissumcientforthepurposesofth 1 sd.scus. ^ cnaractcr Xy pairs. Apart from Xy coordinates also 

M °-?he behaviour of |G(a>)|. scaled to its maximum, is mnemonic names are given of the components as used in 
given by the ^ function £»f ^elEitemclock INVCK, DIVCK (at Ne) is 

lS preferably driven y**™^*^ « * 

be approximated quite well by |l/(Pl.x |. The next 50 «PI»"gn xorner (at Zk) t^ dual ^bitbu^ count- 
largest maxima ate found at x- '-/+ 1.430 with size « C b ^T" 3 ™ md,c * tcd th * t « dnvcn b / $yttcm 
0.217, i.e. a fifth of the main maximum. It seems reason- «**k. These counters generate addresses for phase and 
able to take as a rule of thumb, that a neighbouring £«*e. change memories and also for video memory, 
frequency »««0+Aa>0 should give an |x|-value no The six Isb s. i.e. least significant bits, of these addresses 
leu than xl = 1.430. At this limit the main maximum of 55 *lways indicate a particular escalator (phase and phase 
the neighboring pixel spectrum with basic frequency change memory) and its corresponding pixel position 
<*0+A*0is located at the frequency of one of the next- memory). During image-to-sound conversion, 

to-main maxima of the original pixel spectrum with one the counters are normally configured as one large 23-bit 
of the next-to-main maxima of the original pixel spec- ripple-carry counter. During frame grabbing, the multi- 
tniro with basic frequency wO. So k.Aw0/«0>*=xl, 60 plexer MUX bypasses the 7-bit middle counter CNT2 to 
when considering positive differences. Let the resolu- give a 128-fold increase in the frequencies of thc most 
tion to be obtained be NxN pixels. First the horizontal significant counter bits (the bottom counter). This is 
time separation between pixels is calculated. With a needed to grab a video frame within the 20 ms (30 Hz) 
tingle image conversion time T seconds, there are T/N television signal single frame time and thus avoid 
seconds per pixel: T/N=b-a=k # 2PI/<i*0.»> with 65 blurred images. The six msb's, i.e. most significant bits, 
<D=2PI.f this gives k*=fl).T/N. The relation of the bottom counter CNT1 always indicate a particu- 
k.Afl)/fl)>«xl should hold. Let the useful auditory lar vertical scanline (horizontal position). The middle 
bandwidth be give by 8. An equidistant frequency dis- counter just ensures that it takes some time (and sound 



09/15/2003, EAST Version: 1.04.0000 



samples) before the next vertical scan line is going to be 
convened inio sound. The 5 MHz input to the top 
counter CNT3 causes the output of the top counter to 
change every 500 ns. During image-to- sound conver- 
sion the output of the bottom counter changes every 5 
16.4 ms, so the conversion of the whole image, i.e. 64 
vertical scanlines, takes about 1.03 s. This can be most 
easily changed into a 2.1 second conversion time by 
using the full 8-bits of the middle counter, which is the 
purpose of the switch So> at Yi. In that case the counters 10 
are configured as one large 24-bit counter. However, in 
the discussion a 7-bit middle counter is assumed (1.05 s 
conversion time), giving a 23-bit total counter (bit 0 
through 22), unless stated otherwise. Addresses gener- 
ated by the counters go unlatched to phase change 13 
EPROMs DF1U (at Ri) t while they are latched by 
L1CNT. L2CNT before going to phase SRAMs FIU 
(at Ni). Thus care has been taken of the fact that the 
EPROMs are much slower (250 ns) than the SRAMs 
(150 ns). Therefore the EPROMs receive their ad- 20 
dresses 250 ns earlier. A phase change of a particular 
oscillator read from the EPROMs is added to a present 
phase read from the SRAMs. Summation takes place in ' 
4-bit full adders AD 1-4 at (Bi-Ki), and the result is 
latched by octal latches L1FI. L2F] (at Cj. lj) before 23 
being rewritten into the SRAMs. The new phase is also 
sem through latches L3FI, L4FJ (at Dg), together with 
4-bhs pixel brightness information coming from video 
SRAMs PI X 1-4 (at Xb-Xe). After a possible negation 
(ones complement) by exclusive-ORs XOR1-3 (at 30 
Bf-GD. the phase and brightness are used as an address 
for the sine EPROMs S1N1.2 (at Kf). These give a sine 
value belonging to a phase range 0 . . .PI/2 (1st quad- 
rant), and scaled by the brightness value. The whole 2PI 
(four quadrant) phase range is covered by complement- 35 
ing the phase using exclusive-ORs and by bypassing the 
sine EPROMs with an extra sign bit through a line 
passing through a D-flipflop DFF2 at Ae; this flipflop 
gives a delay ensuring that the sign bit keeps pace with 
the rest of the sine bits. This sign bit determines whether 40 
the ALUs ALU1-5 (at Mc-Ac) add or subtract. The 
ALUs combine the results of all 64 emulated oscillators 
in one superposition sample. The latches L1SIN, 
L2SIN, L3SIN at Nd, Id, Cd are just for synchroniza- 
tion of the adding process. When the superposition has 45 
been obtained after 64 system clock cycles, the result is 
sent through latches L1DAC, L2DAC (at Oa) to a 
16-bit digital-to-anaJog converter DAC (at (Qa). The 
inverter IN VI at the bottom of FIG. 4 serves to give an 
offset to the summation process by the ALUs after 50 
clearing the latches at Cd, Id, Nd. The DAC (hexadeci- 
mal) input range is OOOOH til FFFFH, so the starting 
value for the addition and subtraction process should be 
halfway at 80000H to stay within this range after adding 
and subtracting 64 scaled sine samples. The design as 53 
indicated keeps the superposition almost always within 
this range without modulo effects (which would occur 
beyond OOOOH and FFFFH), even for bright images. 
This is of importance, because overflows cause a dis- 
tracting clocking or cracking noise. The average ampli- 60 
rude of the superposition will grow roughly with the 
square root of the number of independent oscillators 
tiroes the average amplitudes of these oscillators. This 
can be seen from statistical considerations when apply- 
ing the central limit theorem to the oscillator signals and 65 
treating them as stochastic variables, and simplifying to 
the worst case situation that all oscillators are at their 
amplitude value (+ or -). Therefore the average am- 
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plitude of such a 64 oscillator superposition will be 
about 8 times the amplitude of an individual oscillator, 
also assuming equal amplitudes of all oscillators as for a 
maximum brightness image. This factor 8 gives a 3-bit 
shift, which means that there must be provisions for 
handling at least 3 more bits. This is the purpose of 
ALU5 (at Ac), which provides 4 extra bits together 
with part of L3SIN (at Cd). The output of the DAC (at 
Qa) is sent through an analog output stage, indicated 
only symbolically by an operational amplifier OPAMP 
(at Sa). Finally the result reaches headphones (at Ta). 

(•) Numerical calculations on sine superpositions 
showed that for a very bright image field, 3 bits would 
cause overflow during 16% of the time, whereas 4 bits 
would cause overflow during 0.5% of the time. Experi- 
mentally, this appears to be disturbing. Overflows for 
large and very bright image parts are heard as a disturb- 
ing "cracking" sound. Division of all sine values in the 
EPROMS by a factor of 4 cures this overflow problem 
with no noticable loss of sound quality , (a 16-bit sine 
value would have been rather redundant anyway). 

In the above a particular oscillator sample has more 
or less been followed through the system. When the 
image-to-sound conversion is complete, the frame grab- 
bing process starts, triggered by bit 21 of counter 
CNT1-3. 

An analog video signal from an image sensing unit, 
such as a camera, indicated symbolically at Tb, is sent 
through a sample-and-hold circuit SAM, indicated in 
FIG. 4 as part of the ADC block at Uc, and converted 
to a 4-bit digital signal in Gray code by an A/D con- 
vener ADO -15 (at Uc). A more detailed illustration of 
a sample and hold circuit is given in FIG. 8. This serves 
to reduce the probability of getting very inaccurate 
results due to transition states (spikes and glitches). In 
subsequent Gray-coded numbers only one bit at a time 
changes. The 4-bit code is then stored in video SRAM, 
i.e. PI X 1-4 (at Xb-Xe), which receives its addresses 
from counters C NT 1-3 (at Zg-Zk). Two bits of this 
address are used for chop selection by a demultiplexer 
DMX (at Zc). Components around Rd form the control 
Logic (or "random" logic) that takes care of detailed 
timing, synchronization and mode switching (frame 
grabbing versus image-to-sound conversion). This logic 
is depicted in more detail in FIG. 5. The meaning of the 
symbols in the control logic is the following. The tau's 
represent triggerable delays as monostable flip flops 
(monoflops) MON1 and MON2. Small signa's represent 
the horizontal and vertical synchronization pulses of the 
video signal ADHSNC and ADVSNC. The delta's 
represent differentiating subcircuiu DIFF (illustrated in 
FIG. 9a) that generate a three-gate-delay spike output 
after the trailing edge of an input pulse as shown in FIG. 
9b. This spike output is long enough to trigger subse- 
quent circuitry. 

In a prototype implementation according to the de- 
scription in the invention, the whole image processing 
unit, i.e. the whole circuit for frame grabbing and im- 
age-to-sound conversion, has been implemented on a 
single circuit board measuring only 236 X 160 mm. This 
proves that the construction of a modest-sized portable 
system is possible. The only devices not on the board 
arc the camera and the power supply. The camera that 
is used in the prototype system is the camera of the 
Philips 12TX3412 observation system. The power con- 
sumption of the system appeared to be quite low: a 
measured 4.4co (5 V, 0.83A and 12 V, 21 mA) excluding 
the camera. This is obtained while using mainly LS 
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TTL components. So the system is obviously practical 
for battery operation. The power consumption could be 
reduced even much further by applying CMOS ICs. LS 
TTL is attractive because of its combination of low cost 
and high electronic stability. The vidicon camera of the 3 
12G x 3412 has a specified power consumption of 2.7cj. 
This too could be much reduced by using a CCD cam- 
era instead, having a typical lOOmcj power consump- 
tion. Still even the total power consumption of 7.1 to 
could be supplied by a sufficient number of battery 10 
monocells (each 1.2 V, 4 A, hr) so as to last more than 
six hours, which should be more than enough for a full 
day. 

To obtain 64 different oscillator frequencies, at the 
very least 64 different values axe needed in the phase 15 
change memory, i.e. 6 bits. But to be able to vary the 
differences in frequency for successive oscillators, more 
flexibility is needed. For example, if those differences 
are to increase linearly with oscillator number, 63 dif- 
ferent phase change differences have to be represented, 20 
rounded to 64 for convenience. Then 6 bits would be 
needed to indicate a basic oscillator phase change, and 
another 6 bits to represent the difference in phase 
change from the other oscillators phase changes. So 12 
bits phase change accuracy is the minimum for a set of 25 
frequencies in which the differences between successive 
frequencies vary linearly with oscillator number. A 
good choice is the use of 16 bits accuracy in order to 
obtain extra freedom for non-linearly increasing fre- 
quency differences. 30 

As illustrated in FIG. 4, not all 16 phase bits are used 
in calculating the sine values. The phase is truncated to 
its most significant 12 bits to free 4 bitlines that carry 
amplitude information from video memory to the sine 
memory for scaling. The truncation causes an error in 35 
the phases. The size of the error depends on the values 
of the four least significant phase bits. Because there is 
no intended special relation between a 31.25 kHz system 
sample frequency and an emulated oscillator signal, the 
four discarded bits will be considered as noise in succes- 40 
sivc samples of a particular oscillator. This is phase 
noise, resulting in frequency noise of the oscillator. The 
frequency noise must be much less than the frequency 
difference between neighbouring oscillators to prevent 
loss of resolution. The average frequency is still speci- 45 
fled by the 16 bits phase accuracy, but the actual signal 
jumps a little around this frequency because of the trun- 
cation. This frequency noise can be calculated. Let the 
system clock have frequency F. Because the system 
clock supports 64 oscillators sequentially, F/64 oscilla- 50 
tor samples per second are delivered. 12 Bits specify a 
sine value from the whole 2* PI radians period (10 bits 
specify PI/2 radians in the sine EPROMs). The phase 
error resulting from not using more bits (truncation) is 
therefore less than the least significant bit of the 12 bits, S3 
corresponding to a j 12 period phase step error when 
going from one oscillator sample to the next. This in 
turn corresponds to t frequency noise of 
(F/64)/2 12=7.63 Hz for F«=2 MHz. This is indeed 
much less then the frequency differences between 60 
neighbouring oscillators. For example, an equidistant 
frequency distribution of 64 oscillators over a band- 
width B gives steps in frequency of size B/(64-l). Let 
B-5 kHz, then this gives 79.4 Hz> >7.63 Hz. Many 
non-equidistant schemes also match the above condition 65 
for the frequency noise. 

Because the phase calculations are 16 times more 
accurate (4 bits), it is noted, that a frequency in the 



phase change memory can be programmed with a reso- 
lution of about 0.3 Hz, which gives more than enough 
flexibility in choosing a non-equidistant frequency dis- 
tribution. * ' 

VIDEO FRAME GRABBING 

The mnemonic component names and two-character 
coordinates in the following text still refer to FIG. 4. A 
conventional video camera can be used on its side (ro- 
uted 90 degrees), such that image scanning may actu- 
ally take place from bottom to top for each scanline, and 
from left to right for successive scanlines. This is just to 
have the same pixel order for frame grabbing as for 
image-to-sound conversion. Then just a single address 
generator suffices (the counter CNT1-3), which saves a 
lot of components. Thus system size and cost can be 
reduced. In order to avoid confusion, a frame grabbing 
system is described as if the camera were not tilted. This 
means that the conventional names for video signals are 
used. It should be kept in mind that with this convention 
a horizontal scanline of the video frame will represent a 
vertical scanline in the user image. The term "user" is 
used when indicating the image the user perceives. 

A video frame is scanned by the camera 50 times per 
second, assuming the European PAL convention for 
convenience, as stated before, because other television 
standards would only slightly alter the descriptions. 
The scanning takes place from left to right (64 us per 
horizontal line, i.e. 13,625 Hz), and from top to bottom 
(312/313 lines). The 625-line PAL standard black and 
white video signal applies interleaving, ahernatingly 
scanning 312 vertical lines in one 20 ms frame and 313 
lines in the other. Because only a vertical resolution 
(horizontal to the user) of 64 pixels is needed, one arbi- 
trary 20 ms frame is grabbed to minimize blurred im- 
ages. The system frequency is independent of the cam- 
era frequency, so synchronization is needed for frame 
grabbing. The frame grabbing process is enabled when 
bit 21 of the counter CNT1-3 becomes low, which 
happens when the previous image-to- sound conversion 
is completed. The system then halts (clock disabled) 
until a vertical synchronization pulse in the video signal 
occurs (ADVSNC). Subsequently monoflop MON- 
/adds another 1.52 ms delay. This ensures that the top 
margin of the frame is skipped (the left margin to the 
user, which is also invisible on the monitor). The other 
monoflop MON2 is then enabled, and triggered by a 
horizontal synchronization pulse in the video signal 
(ADHSNC). This adds another 15.2 us delay to skip the 
left margin of the frame (the bottom margin to the user, 
also invisible on the monitor). Then the system clock is 
enabled, allowing the counter to count freely while the 
first clock is enabled, allowing the counter to count 
freely while the first horizontal line Oeftmost vertical 
user line) is scanned. During one clock phase the video 
signal is sampled in a sample-and-hold stage and con- 
verted into Gray-code by a set of comparators (not 
shown). During the other clock phase, when the 4-bit 
code has stabilized, the resulting digital brightness value 
(grey-tone) is latched and stored in video memory, 
while the sample-and-hold stage is opened for capturing 
the next video sample (pixel). The clock is disabled after 
64 clock pulses, i.e. 32 us, which covers roost of the area 
visible on the monitor. By then the first 64 pixels have 
been stored. After a horizontal synchronization pulse 
and a 15.2 us delay from the monoflop the clock is 
enabled again for the second line etc. Because of the 
counter configuration CNT1-3, with CNT2 bypassed. 
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only every fourth line of the video signal causes an 
PIX1-4. So only the last of every four lines is actually 
remembered for later use. The others are overwritten. 
This is meant to make uniform use of the 312/313 line 
frame. Effectively grabbing only every fourth line en- 5 
sures that the 64 vertical pixel resolution covers 
64*4=256 lines of the frame, which is almost the whole 
visible field of the camera. After 256 scanlines, bit 21 of 
the counter becomes high, disabling the frame grabbing 
and restarting the image-to-sound conversion. 10 

Frame grabbing is indicated by the following steps A 
through F: 

A: bit 21 becomes tow, disabling the system clock 
B: a vertical synchronization pulse triggers the first 

monoflop 13 
C: the delayed pulse from the first monoflop enables the 

second monoflop 
D: a horizontal synchronization pulse triggers the sec* 

ond monoflop 

£: the second monoflop enables the system clock 20 
F: bit 5 of CNT1 disables the system clock after every 
64 pulses 

CONTROL LOGIC 

Control logic or random logic having the function of 25 
controlling the cooperation and synchronization of 
different circuit blocks and pipeline stages is shown in 
more detail in FIG. 5. The labels A-F refer to the 
screen positions in the picture of the above schematic 
overview. 30 

A J Bit 21 becomes low, thereby giving a low clear 
pulse at OFF.l to grab-enable flipflop GEFF after dif- 
ferentation and inversion. This forces NOR 1.3 to re- 
main or become high, which causes NORM to stay 
low, as there is only one-sided differentiation. The con- 35 
stant high input at DFF2.10 is therefore inactive. When 
bit 21 gets low, so does biy 5, which causes a diflractia- 
tion pulse at NOR2.4. Through inversion this becomes 
a low clear pulse at DFF2.13 to the clock-enable flip- 
flop GEFF. The output DFF2.9 will then become low 40 
and disable the system clock, DIVCK. Now only an 
external pulse can restart the system. The only external 
pulses come from the synchronization signals at MON.5 
and MON.13. The horizontal sync pulse MON.5 cannot 
restart the system, because OR.5 is high. So the system 45 
has to wait until a vertical sync occurs at the time point 
indicated by label B, which after some delay sets at label 
C the grab-enable flipflop by pulling DFF1 4 low for a 
short time, so DFF1.6 becomes low. But then OR 1.6 
will have a high-to-low transition as soon as the delayed 50 
horizontal sync, position label D, at MON.5 becomes 
low, label E. MON.5 could already be low, which 
would cause a mutilated first scanline because of bad 
timing. However, the first three of every four scanlines 
arc overwritten, so this does not matter. 55 

E] OR.5 is constantly low and OR.4 just becomes 
tow, thereby causing a high pulse at NORM, which 
after inversion sets the clock-enable flipflop at 
DFF2.13. The system clock then starts running, and the 
counter CNT1-3 can generate addresses for memory. 60 
64 pixels are stored in 64 system clock cycles, before 
counter bit 5 becomes low again (at label F, bit 21 stays 
low for a much longer time) and clears the clock-enable 
flipflop at DFF2.13 after differentiation (NOR2.4) and 
inversion (1NV1.12). The image processing unit now 65 
waits (halts) until the next horizontal sync passes 
through MON.5, which enables the system clock again 
at label E in the way described above, OR.5 still being 



low. Then the next scanline with 64 pixels is stored in 
memory (PI XI -4), until bit 5 becomes low again at label 
F, which disables the system clock until the next de- 
layed horizontal sync etc. 

After 256 of these scans, effectively storing only 
every fourth line, i.e. 64 lines in all, bit 21 becomes high 
again, thereby forcing OR .8 to a constant high level, 
which prevents any further clock disable pulses at 
DFF2.13 as long as bit 21 stays high. A constant read 
signal is forced upon PIX1-4 for a high bit 21 through 
OR.lt. This is the image-to-sound conversion phase, for 
which the synchronization is much less tricky than for 
the frame grabbing, during which two asynchronous 
subsystems, i.e. the image sensing unit and image pro- 
cessing unit, have to be synchronized The major issues 
are now to prevent bus conflicts, shift data through the 
stages of the pipeline and output an audio sample every 
64 system clock cycles. The latter task is handled at 
DFF1. 12. CNT1-3 count after the system clock signal 
goes low. It takes some time before bit 5 can go low 
because of the ripple carry counter, so it takes one extra 
clock cycle before DFF1.9 goes low. This delay pre- 
vents the loss of the last (64) oscillator contribution 
when the latches L1SIN-L2SIN are cleared and the last 
audio sample is shifted to the DA-convertcr through the 
latches L1DAC, L2DAC 

The exclusive -OR that controls the shift signals at 
L ADC. 11 prevents the "wrap-around" of the latest, 
rightmost, pixel samples of a horizontal scanline during 
frame grabbing. Without it, these samples would con- 
tribute to the leftmost pixel samples, because the latch 
of the AD-convener receives no shift pulses, nor does 
the sample-and-hold circuit, when the system clock is 
disabled (clock is low). The latch would therefore shift 
old information at its inputs when the system clock is 
enabled again and the clock goes high after a while, 
because the sample-and-hold gate opens only when the 
clock is high. The short extra pulse through NORM 
and XOR3.11 at the restart of the system clock flushes 
this old information and reloads the sample-and-hold 
circuit with information, i.e. a voltage, associated with 
the leftmost pixels. The time constant of the sample- 
and-hold circuit is approximately 10 ns, which is suffi- 
ciently short for reloading a reasonably accurate value 
for the first sample of a horizontal scanline, because the 
three-gate-delay pulse of the differentiator has a dura- 
tion of approximately 30 ns. The AD-converter and 
Gray code encoder have sufficient time to digitize this 
newly sampled value before the system clock goes high 
after being enabled. So when the system clock becomes 
high, a proper first pixel value is shifted through the 
latch LA DC into video memory P1X1-4. 

In this section the timing of the image processing unit 
is considered in more detail. The discussion will there- 
fore become more dependent on specific hardware 
choices. A closer look is needed at the image-to-sound 
conversion operations that take place in parallel during 
each phase of the s MHz system clock, giving 250 ns 
high (H) and tew (L) levels. Time delay estimates are 
taken from manufacturer's data books and data sheets, 
mostly from Texas Instruments. Mnemonic names refer 
to the circuit design described heretofore. The process 
numbers 1 through 5 in front of the sets of operations 
taking place in parallel indicate the route taken by a 
particular oscillator sample. I.e. they indicate the stages 
in the pipeline. A sample starts at process 1, then pro- 
ceeds to process 2 etc. Processes 1, 3 and 5 take place in 
parallel at clock = L, but they operate on different osc il- 
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lator samples. The ame applies to processes 2 and 4 ai 
clock = H. 



Clock — (250 ns) 



•Presets I: 



Typical 
delay: 



increment counters CNTl-i 

put count to address DF1 EPROMs (not via latch) 

use address lo read DFI EPROMs 

Total: 



70 ns 
0 n$ 
250 ns 
320 ns 



18 

-continued 



"process 4: 



Total: 140 ns 



In the schematic below the main actions taking place 
near the 64 clock cycles boundary are indicated sche- 
matically. Superposition is abbreviated to "sup". The 
numbers extending the words indicate the associated 
10 oscillator, i.e. 0 through 63. 



The reading of DFI continues during process 2, so a 
total delay > 250 ns is allowed for the process 1. DFT 
will need another 70 ns of the next clock phase H. |5 



ccun- 
clock ter 



•Process J: 






shift sum to Fl SRAM i, latches LIF1. UF1 


30 


ns 


(old address, couni not yet shifted) 






write um into Fl SRAMs (Tl p. 8-14) 


130 


ns 


Tout: 


170 


ns 


k (simultaneously): 






shift brighness from P1X to address 


max (20 


ns 


SIN 1.2 EPROMs, UFI 






* shif> phase to esclusive ORs XOR 1-5. 


,20 


ns) 


Inches UF1. L4FI 






exclusive ORs give ones complement yes/oo 


20 


ns 


use result as address to read SIN 1,2 EPROMs 


250 


ns 


TOUI: 


290 


ns 



20 



25 



The reading of SIN continues during process 4, so a 
total delay > 250 ns is allowed for the second part of 
process 3. SIN will need another 40 ns of the next clock 
phase H. 
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•process 5: 

shift superposition sample, latches LI SIN*. L2SIN. L3S1N 

Total: 

* (only once every 64 clock cycles) 

clear superposition: clear latches LI SIN, L2S1N. L3SIN 

* shift superposition to DaC, latches L1DAC, L2DAC 



35 



40 



The latter part of process 5 is triggered by a differentiat- 
ing circuit to stay well within the 250 ns limit. The 64th 
cycle is detected by observing the delayed trailing edge 43 
of the 6th bit of counter CNT1. 



dock — H (290 ns) 


•process 2; 




shift count 10 address Fl SRAMs, latches 


20 ns 


L1CNT. LiCNT 




use address to rod Fl SRAMS 


nuOSO ns 


* continue reading DFI EPROMs (process I) 


70 ns 


sum results Fl and DFI in ADM 


50 ns 




Total: 220 ns 


* (simultaneously) 




shift count to address PIXM SRAMs. latches 


20 ns 


LICNT. L2CNT 




use address to read FIX SRAMs 


150 ns 




Total; 170 ns 



Both totals axe less than the 250 ns clock phase avail- 
able. 



SO 
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•process 4: 



continue reading SIN EPROMs (process 3) 

add rcsull to obtain superposition sample, ALU I -5 



40 ns 

100 ns 



I claim: 

1. Transformation system for converting a visual 
image signal representative of pixels in sequential scan- 
lines through an image into an acoustic signal composed 
of sequential acoustic signal sections respectively corre- 
sponding to said scanlines, each acoustic signal section 
being formed of sequential output amplitude samples 
equally spaced apart by a predetermined time step and 
representing a superposition of signal contributions 
from respective pixels in the corresponding scanline, 
each signal contribution from a pixel being determined 
from a sinusoid frequency due to a position, of said pixel 
in the scanline and a sinusoid amplitude due to a bright- 
ness of said pixel, said system comprising: 
a processing unit having an image signal input for a 
current image and an output, said unit comprising: 
first memory means fed by said image signal input for 
storing sinusoid amplitudes respectively due to 
brightnesses of the pixels in a current scanline to be 
processed in the current image; 
second memory means for storing phase increments 
due to the pixel positions in the current scanline 
respectively corresponding to sinusoid frequencies 
due to the pixel positions in said current scanline 
times the predetermined time step; 
third memory means for storing cumulated phases for 
the signal contributions from the respective pixels 
positions in said current scanline to a currently 
processed output amplitude sample; 
first summing means responsive to data outputs of 
said first and second memory means for forming a 
current cumulated phase for the signal contribution 
from a currently processed pixel position in said 
current scanline as a sum of a last cumulated phase 
for the signal contribution from said currently pro- 
cessed pixel position, stored in said second memory 
means, and the phase increment due to said cur- 
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rcntly processed pixel position, stored in said first 
memory means, and for updating said third mem- 
ory means with said sum; 
sine means responsive to a data output of the first 
memory' means and to the current cumulated phase 5 
formed by the summing means for forming a signal 
contribution to the currently processed output 
amplitude sample due to the currently processed 
pixel position as a product of the sinusoid ampli- 
tude and a sinusoidal function of said current cu* 10 
mulated phase for the currently processed pixel 
position; and 

second summing means responsive to the output of 
the sine means for forming a cumulative sum of 
contributions to the currently processed amplitude 15 
sample due to all pixel positions in the current 
scan line, having an output comprising the output of 
said image processing unit; 
image input means coupled to said image signal input 
for successively updating the current image at a 20 
frame rate; and 
audio transducer means responsive to the output of 
said image processing unit for successively forming 
the acoustic signal sections corresponding to each 
successive current scanline. 25 
Z The system as claimed in claim 1, wherein said sine 
means comprises a ROM containing scaled sine values 
at memory addresses given by the combination of the 
current cumulated phase and sinusoid amplitude for the 
currently processed pixel position. 30 

3. Transformation system as claimed in claim 1, char* . 
acterized in that said image input means comprises a 
portable image sensing system for feeding said image 
signal input of said image processing unit. 

4. Transformation system as claimed in claim 1. 35 
wherein said image input means updates said current 
image at a television frame rate and said image process- 
ing unit operates in real-time on successively updated 
current images. 

5. Transformation system as claimed in claim 4, char- 40 
acterized in that said image processing unit comprises a 
pipelined digital processing architecture. 

6. Transformation system as claimed in claim 1, 
wherein each scanline is a pixel column. 

7. Transformation method for converting a visual 45 
image signal representative of pixels of sequential scan- 
lines through an image into an acoustic signal composed 
of sequential acoustic signal sections respectively corre- 



sponding to said scan lines, each acoustic signal section 
being formed of sequential output amplitude samples 
equally spaced apart by a predetermined time step and 
representing a superposition of signal contributions 
from respective pixels in the corresponding scanline, 
each signal contribution from a pixel being determined 
from a sinusoid frequency due to a position of said pixel 
in the scanline and a sinusoid amplitude due to a bright- 
ness of said pixel, said method comprising: 
first storing sinusoid amplitudes due to brightnesses 

of pixels in a current scanline to be processed; 
second storing phase increments due to the pixel 
positions in the current scanline respectively corre- 
sponding to the sinusoid frequencies due to the 
pixel positions in the scanline times the predeter- 
mined time step; 
third storing cumulated phases for the signal contri- 
butions from the respective pixel positions in said 
current scanline to a currently processed output 
amplitude sample; 
first forming a current cumulated phase for the signal 
contribution from a currently processed pixel posi- 
tion in said current scanline as a sum of the third 
stored cumulated phase for said currently pro- 
cessed pixel position and the second stored phase 
increment for due to the currently processed pixel 
position, and updating the third stored cumulative 
phase for the currently processed pixel position 
with said sum; 
second forming the signal contribution to the cur- 
rently processed output amplitude sample due to 
the currently processed pixel position as a product 
of the sinusoid amplitude and a sinusoidal function 
of said current cumulated phase for the currently 
processed pixel position; and 
third forming a cumulative sum of the second formed 
contributions to the currently processed amplitude 
sample due to all pixel positions in said scanline. 

8. The method as claimed in claim 7, wherein said 
second forming comprises addressing a ROM contain- 
ing scaled sine values at an address formed by the com- 
bination of the current cumulated phase and sinusoid 
amplitude for the currently processed pixel position. 

9. A method as claimed in claim 7, wherein the sinus- 
oid frequency associated with each pixel position in- 
creases geometrically with change of pixel position in a 
predetermined direction along a scan line. 

* • ♦ * • 
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